Disputation
16 April 2024
University of Mannheim
Which methods can we use to classify data from open-ended survey questions?
Can we leverage these methods to make empirical contributions to substantial questions?
Motivation:
➡️ The increase in methods to collect natural language (e.g., smartphone surveys and voice technologies) calls for testing and validating automated methods to analyze the resulting data.
➡️ Open-ended survey answers pose a unique challenge for ML applications due to their shortness and lack of context. An effective analysis might require the use of suitable methods, e.g., word embeddings, structural topic models.
Figure 1: The previous question was: ‘How often can you trust the federal government in Washington to do what is right?’. Your answer was: ‘[Always; Most of the time; About half of the time; Some of the time; Never; Don’t Know]’. In your own words, please explain why you selected this answer.
Table 1. Overview of methods for classifying open-ended survey responses
| Study 1 | Study 2 | Study 3 |
|---|---|---|
| How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning | Open-ended survey questions: A comparison of information content in text and audio response format | Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys? |
Landesvatter, C., & Bauer, P. C. (2024). How Valid Are Trust Survey Measures? New Insights From Open-Ended Probing Data and Supervised Machine Learning. Sociological Methods & Research, 0(0). https://doi.org/10.1177/00491241241234871
Landesvatter, C., & Bauer, P. C. (February 2024). Open-ended survey questions: A comparison of information content in text and audio response formats. Working Paper, submitted to Public Opinion Quarterly.
Landesvatter, C., & Bauer, P. C. (March 2024). Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?. Working Paper, submitted to American Political Science Review.
Operationalization via sentiment and emotion analysis
Transcript-based
Speech-based
web surveys can be used to collect narrative answers that provide valuable insights into survey responses
various modern developments (smartphone surveys, speech-to-text algorithms) can be leveraged to collect such data in innovative ways (e.g., spoken answers)
computational measures can inform ongoing debates in different fields by classifying open-ended answers from surveys, e.g.:
Facilitated accessibility and implementation of semi-automated methods.
supervised models have been a standard in automated methods, but recent developments of large and general-aim pre-trained models (e.g., BERT) allow less resource-intensive fine-tuning
For example, using only ~13% (1,000 documents from 7,500 in Study 1) documents for fine-tuning resulted in sufficient accuracy (i.e., 87%)
Increase in possibilities of fully automated methods (e.g., prompt engineering.
Landesvatter: Methods for the Classification of Data from Open-Ended Questions in Surveys